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Abstract 


A key aim of a repeated survey is to allow one or more items to be monitored across time. For 
survey design purposes this aim has often been simplified to two objectives: good estimates of an 
item of interest for each period, and good estimates of period-to-period change in the item. In 
the Australian Labour Force Survey (LFS) these objectives lead to a design with high overlap 
between successive monthly samples. 


Focusing on good estimates of the 'underlying direction' of the series, and how it changes over 
time, could lead to quite different survey designs. Previous work suggests that a sample rotation 
pattern with no month to month overlap would provide better trend estimates. Unfortunately 
such a rotation pattern gives poor estimates of month-to-month change. 


This paper considers an alternative estimator, the linear composite estimator, in combination 
with various sample rotation patterns. A rotation pattern is presented in which individuals are 
sampled for two successive months out of every four months, giving a 50% overlap of sample 
between successive months. By using composite estimation this rotation pattern can yield 
improved estimates of trend while maintaining good estimates of month-to-month change. 


1 Introduction 


1.1 Survey outcomes and sample design 


A key aim of a repeated survey is to allow one or more items to be monitored across time. For 
survey design purposes this aim has often been simplified to two objectives: good estimates of an 
item for each period, and good estimates of period-to-period change in the item. In the 
Australian Labour Force Survey (LFS) these objectives lead to a design with high overlap between 
successive monthly samples. 


This paper suggests that survey designers should take account of objectives related to longer 
term change across time. For many surveys, estimates behave quite erratically from period to 
period. Users interested in policy evaluation or prediction of future values will often be 
attempting to assess the ‘underlying direction' of the series, perhaps by using some smoothing 
technique or making an assessment 'by eye’. In doing so they are incorporating information from 
a number of periods up to the current period. Survey designs that seek to optimise the survey 
for such longer term assessments may be quite different from those that are optimal for period to 
period change. 


Tallis (1995) suggested that high overlap between successive surveys for the LFS reduces the 
ability to detect turning points in the economy. This and work by Sutcliffe and Lee (1995) 
suggest that a sample rotation pattern with no month-to-month overlap would provide better 
estimates of the underlying direction of the series. This paper extends this work by considering 
an alternative estimator, the linear composite estimator, in combination with various sample 
rotation patterns. Composite estimation is not currently used in the LFS, though a different form 
known as the AK composite estimator has been used for many years in the Current Population 
Survey run by the United States Bureau of Labour Statistics (Gurney & Daley 1965). 


Section 2 defines a variety of outcomes for a repeated survey. Besides a number of standard 
estimates, it introduces a 'trend' estimate that attempts to smooth out seasonal effects and local 
irregularities. This trend is introduced as a surrogate for the various methods of assessing the 
underlying direction of the series. Outcomes of interest are measures of the level and rate of 
change of the trend at the end of the series, and also how much the trend at a time point is 
revised as estimates for later times become available. Variance and mean squared error for the 
various outcomes are defined. 


Sections 3 and 4 describe two aspects of survey design that can be changed to alter these survey 
outcomes. Section 3 describes survey rotation patterns, which control the overlap between the 
units selected in the survey for different months. The current, high-overlap pattern for the LFS is 
presented, along with two alternative rotation patterns that would lead to lower overlap between 
successive months. 


Section 4 describes different survey estimators. It presents a class of linear composite estimators 
which make use of data from a number of successive months. These estimators make use of the 
correlation structure of the survey estimates to produce estimators with lower variance than the 
simple estimator. How useful these estimators are depends on the correlation structure and 
hence on the survey rotation pattern. 


Section 5 presents the effects of the available rotation patterns and estimators on the various 
survey outcomes, in the case of the LFS. It is seen that the different designs are good for 
different outcomes, with the current rotation pattern good for month to month change but 
inferior to the other patterns for assessing longer term direction of the series. 


Section 6 gives the conclusions of the paper. While previous studies have presented the impact 
of rotation pattern on trend, this paper is new in assessing the combined impact of composite 
estimation and rotation pattern. One of the rotation patterns presented (the '2 in 2 out' 
pattern) is seen to be quite effective in combination with composite estimation. The final 
message to survey designers is the importance of knowing the key outcomes of the survey and 
using this information in assessing different survey designs. 


2 A discussion of survey outcomes 


2.1 Level and movement objectives 


This section will discuss the variety of different outcomes that can be measured from a repeated 
survey. The stated objectives of the survey will be aimed at some subset of these outcomes. For 
many repeated surveys the only stated objectives relate to the quality of estimates for key items 
at individual time points, and of movement (or change) of these items between successive time 
points. This paper will argue that the objectives of a survey design should be aimed at a wider 
set of outcomes. 


The basic aim of any survey is to provide estimates of various population characteristics with 
sufficient accuracy for the uses to which they are put. In a one-off survey this maps to a fairly 
clear objective — we want to get low bias and low sampling error for one or more key estimates. 


In a repeated survey we wish to provide good estimates not just of values at a single time point, 
but also of how the population is changing over time. These objectives are related, since 
improving the accuracy of estimates at each time point will usually result in a better picture of 
changes over time. Because of this, much sample design work has been focused on obtaining 
good cross-sectional estimates (or level estimates). For this purpose the focus of design work is 
typically the size and composition of the sample and how to use any available extra data such as 
population benchmarks. 


Designing for good level estimates leaves considerable room for affecting the quality of 
longitudinal measures. Consider the estimate of change between two months (the 

lag one movement estimate). The sampling error on this estimate depends not just on the 
sampling error on the level estimates but also on the correlation between estimates from the two 
months. The best estimates of movement will result from a high correlation — this can often be 
obtained by retaining a large portion of the sample common to the two months. 


The key design parameter affecting the estimates of change is the overlap between successive 
samples. High overlap improves the estimates of lag one movement in cases where a unit's 
responses for an item are positively correlated between successive periods. Maintaining high 
overlap between repeats of a survey is also operationally convenient, since many sampled units 
have been located and have some experience of the survey. 


Many repeated surveys have been designed with estimates of level and lag one movement as the 
sole design objectives. This leads to survey designs that have high overlap between successive 
survey periods. The motivation for such a design is easy to express to users of the survey, and is 
unlikely to raise controversy. 


2.2 Outcomes related to longer term change 


Unfortunately, in many repeated surveys it would be inappropriate for users to base important 
decisions on the movement from one period to the next. The lag one movement may behave 
quite erratically. One reason is sampling error on the estimates — the survey may simply not be 
large enough to detect real period-to-period movements of the size users wish to respond to. A 
second reason is that the true sequence of population values is affected by irregularity — 
short-term and transient changes in the population which have little relationship to policy 
evaluation or prediction of future values. 


To base sensible decisions on such a series, users need a longer term view of changes in the 
population. This requires comparing data over longer periods. A movement over three or 

four periods may be used, or some smoothing of the data over time. For a monthly survey, users 
may take quarterly averages as a way of smoothing the data. 


Sophisticated users of a repeated survey recognise the danger of responding to the 

lag one movement in its own right. This is evidenced by the widespread use of methods aimed 
at providing a more reliable long-term picture. However, survey designers have rarely 
recognised estimation of longer-term change as a survey objective. 


It turns out that the best survey designs for estimating longer term change may be very different 
to those that are best for estimating lag one movement. In particular, a low overlap between 
periods may lead to improved estimates of longer-term change. 


2.3 Introducing the ‘trend’ 


It is difficult to define what we mean by longer-term change, which makes it hard to assess this 
aspect of the performance of a survey design. One approach is to produce a variety of measures, 
such as movements at longer lags, or averages over multiple survey periods. We will follow this 
approach in some of the evaluation, demonstrating that the same survey designs are appropriate 
for improving a variety of measures. 


In addition, we will introduce the 'trend' of the series. The term 'trend' here refers to a 
smoothing of the series that attempts to remove short-term irregular variation as well as seasonal 
variation. The trend results from a time series decomposition of the series into trend, seasonal 
and irregular components (and other components such as trading day effects). For a discussion 
of such a decomposition as applied in the Australian Bureau of Statistics see ABS (1987). 


Many statistical agencies use methods of time series decomposition based around the X11 
program (Shiskin, Young & Musgrave 1967). For the purposes of this paper we choose a method 
that was derived as a linear approximation to the X11 method by Dagum, Chhab & Chiu (1996). 
This method is used to represent the sort of trend outcome obtained by time series 
decomposition in most agencies. Because it uses a linear transformation of the survey estimates, 
it is straightforward to analytically derive measures of accuracy of estimates under this trending 
procedure. 


While the formulae and results presented in this paper are specific to the particular trend used, 
they should give a good indication of what users are achieving with similar smoothing 
techniques. This is because all smoothing techniques will produce some sort of average across 
data from nearby time points. In this sense the trend given is presented as a surrogate for what 
statistical agencies and the users of statistics are currently doing to determine the direction of the 
underlying series. 


2.4 Estimates and variances for outcomes 


The variance matrix of the survey estimates 


Let Y; be the true population value of the item of interest at time f, and let y; be the survey 
estimate for time ¢. Write Y= {Y;}and y = {y;}as column vectors containing this data for times 
t=1,...,N. 

The survey estimates are assumed to be unbiased, and standard methods can be used to 
calculate their variances and covariances using the survey data. The variance-covariance matrix 
(or, more simply, variance matrix) of the survey estimates is given by 


V =Ey—Yo-¥! 


for E( ) indicating expectation across possible samples. 


In practice it is often appropriate to smooth estimates of variance and covariance across time to 
obtain the estimate of V. This requires making some assumptions about the stationarity of the 
sampling error. Assuming that variances are constant over time, and that correlations depend 
only on the lag between the times, gives the model var(y;) = 07 and cov(y7,-2) = O7Pp. 


Variance of estimates based on multiple months of data 


Let & = {0} be a vector of parameters that define a linear combination ay = Y, ay of the 
survey estimates. The variance of such a linear combination is given by 
var(a’y) = E(a’y—a/Y)(a/y —a'Y)! = «Vo. 

This formula can be used to obtain estimates of movements at various lags, or other derived 
estimates such as quarterly averages. For example, the lag one movement uses 
a’ = (00...0 -1 1). The movement between two quarterly averages would use 

i 1 ot iiii 
O'= 00..0 -3 -3 3334) 
Under the simple stationarity assumptions given above, the lag 8 movement has variance given 
by var(vr—y-e) = 267(1—px). It is clear that this variance will be smaller for a survey design 
which gives large correlation at lag R. 


Outcomes related to trend estimates 


Under the linear approximation to X11, the trend for any time point is a linear combination of 
values at a number of time points. Assume that N, the number of time points available, is large. 
Let M= {t: N—m<t<m} be aset of time points defining the middle of the series — points far 
enough from the beginning and end of the series that adding more estimates would not 
appreciably affect the trend for time points in M. 


Let Jy be the matrix which gives trend values for time points in M based on the N data points. 
The true trend for points in the middle of the series is defined to be 7/,Y. That is, the true trend 
is the result of applying our trending method to the series of true population values for a 
sufficiently large number of times before and after the period of interest. 


The estimated trend for points in M based on observed data for all NV time points is given by Ty): 
We call T{,y the mid trend. 


At time m data is only available for points up to this time point. The trend based only on data up 
to time m will be called the end trend at time m. Let Tg be the matrix which gives trend values 
for points in M based only on data points up to 7, so that Ty is the end trend. This is not 
unbiased for the true trend, since its expectation is TY rather than Ey : 

Outcomes of interest are given in the form a7’ ay (end estimates) or o/ Tuy (mid estimates). We 
define three outcomes that appear critical: /evel of trend uses a/ = (00 ...0 1), movement of 
trend (at lag one) uses a = (00... 0-1 1) and curvature of trend uses 

ao’ =(00...01-21). 


Movement of the trend may be more important to users than its level. Users are often interested 
in turning points, where the trend changes from increasing to decreasing. This clearly is related 
to trend movement. Curvature of the trend is the second difference of the trend, and it is 
concerned with changes in the movement of the trend (such as at a turning point, where the 
movement of the trend changes sign). Such changes are also of key interest to users, and it 
seems clear that a good estimate of turning point requires a small sampling error on the change 
in trend movement between successive time points, i.e. on the curvature. 


Finally, for any trend outcome the value at the end of the series is modified as estimates for later 
months become available. The trend revision for a given outcome will be defined as the 
difference between its value at the end of the series (based only on data to time m™) and its value 
in the middle of the series (i.e. after all revisions). The revision is thus given by a! Tw -a! Tay : 
It is desirable for the trend revision to be as small as possible. 


Mean squared error and revision for trend outcomes 


The variance of a mid trend estimate o/ Ty is given by 
var(a/Tiy) = E(a/ Ty —a/TYV(a/ Ty — a! TY) 
= a! TYVT Mo 
The mean squared error of the corresponding end trend estimate a’ Ty is 
mse(a/Tty) = E(a’Tiy — aT, Ya! Ty — 0! TY)! 
= o/ TLV po + 0 (7, - TADYY (T, — Ti 
The first term here is the variance matrix var(a!T' 1) of the end trend estimate, while the second 
term is the squared revision that would occur given the true data. This second term is due to the 
bias which arises because the end trend does not predict the true trend perfectly even in the 
absence of sampling error. This term is independent of the sample design. 


The mean squared revision matrix for this outcome is given by 

J J J 

F(a’ Ti. — 01! Ty) (ct! Ty —a'Tiyyy! ‘ate 4 as 

= a (Te -—Ty VT - Tuo + 0 (7, — Ty YY! (Te — Ty) 
Both the mean squared error at the end and the mean squared revision contain a component 
that does not depend on survey design. Since we are focused on the effect of sample design it is 
appropriate to exclude this component from our measurements. So for estimates of o/ (eg the 
key measures to calculate are the variance of the trend estimates, var(a! Ty) and 
var(o! Ty) , and the variance of the revisions, given by 


var(a’ Thy — a! Tyy) = 0! (1, —Ty)V (Tg — Tu) 


3 Impact of survey rotation pattern 


3.1 Rotation pattern in the Labour Force Survey 


Methods for controlling overlap between successive survey samples will depend on the nature of 
the repeated survey. We will describe overlap control that uses a fixed survey rotation pattern. 
The details will be from the LFS, a monthly household survey that controls overlap by using a 
rotation pattern. Much of this description will apply straightforwardly to similar household 
surveys. 


The LFS is a survey of the civilian population of Australia aged 15 years or older. Dwellings are 
selected first by selecting geographic areas, and then by choosing a cluster of dwellings from 
each area. Data are collected for all in-scope individuals in these dwellings. 


The initial stage of this multistage selection process is to select geographic areas. These are 
divided into eight 'rotation groups' which are used to control rotation of dwellings into and out 
of the survey. 


The current ‘rotation pattern’ in the LFS consists of sampling the same dwellings from a rotation 
group each month for eight months. In the next month new dwellings from the same 
geographic areas are selected, and they will be sampled for eight more months. The month in 
which new dwellings are selected is different for each rotation group, so that every month one of 
the rotation groups contains new dwellings. 


This rotation pattern ensures that there is an overlap between sampled dwellings in 
seven-eighths of the geographic areas between any two successive months. This gives high 
correlations between successive estimates from the same rotation group. 


3.2 Alternative rotation patterns 


The current LFS rotation pattern is referred to as '8 in', since a new set of dwellings remains in 
sample for eight months. This paper focuses on two alternative patterns which result in reduced 
correlation between successive months. 


The first alternative will be referred to as the '1 in 2 out' pattern. In this rotation pattern each 
dwelling is sampled once a quarter up to a total of eight times. In the other months of the 
quarter, different dwellings from the same geographic regions would be sampled. This rotation 
pattern would produce no sample overlap from month to month. 


The second alternative will be called the '2 in 2 out' pattern. In this rotation pattern each 
dwelling is sampled two months in a row out of every four months, for a total of eight times in 
sample. Different dwellings from the same geographic regions would be sampled on the other 
two months of the four. With this rotation pattern half of the sample would be common to 
consecutive months. 


These patterns can be varied by reducing or increasing the number of times each dwelling is 
sampled. The specific patterns compared in this paper sample each dwelling eight times, and for 
the same sample size they require the same number of geographic areas. So the methods have a 
similar cost to maintain, and the sample at any time point will be equally clustered under each of 
the rotation patterns. 


Other statistical agencies use different rotation patterns for their labour force surveys. Statistics 
Canada uses a'6 in' pattern. The United States Bureau of Labour Statistics uses a '4 in 8 out' 
pattern, while Japan uses a'2 in 10 out' pattern. These last two patterns allow considerable 
overlap between samples a year apart, with the objective of improving estimates of year-to-year 
movement. Both the alternative LFS patterns presented above also allow overlap a year apart. 
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4 Impact of composite estimation 


4.1 Simple estimates 


Let Vy, be an estimate of Y; based on data from the 7th out of R rotation groups. Define the 
series of simple estimates pS = {)°} in which the estimate for a given time point is the mean of 
the rotation group estimates for that time point (i.e. y = St Vrt). 


This simple estimate may differ somewhat from the standard survey estimate, since the survey 
estimates are typically not calculated as the mean of the rotation group estimates. The simple 
estimates are used in this paper as proxies for the standard survey estimates. 


4.2 Linear composite estimates 


The simple estimates at a time point depend only on survey values obtained at that time point. 
By using values obtained at nearby time points it is possible to improve on these simple 
estimates by taking advantage of the autocorrelations between estimates at the rotation group 
level. 


Let Jy = {7 +r} /<1,...z: ew De a column vector of the rotation group estimates based on a set of 
times W (known as the window). For example, a window of size L at time m would be the set of 
times W={t:m—L<t<m}. Each of the rotation group estimates jy ,.; is unbiased for the 
corresponding population value Y;. Let Cw be the matrix such that E(j y) =CwYy, for Yy the 
true population values for times in the window W. 


Define a linear composite estimator as a linear combination B’7 w of the rotation group estimates 
which is unbiased for the value of interest. The expected value of the linear combination B’7 w is 
given by 


E(B’9 w) = B/E w) = B/CwYw 
To obtain an unbiased estimator of an outcome O/Yw requires imposing the constraints 
CyB =a. 
The optimum choice of 8 minimises the variance of the composite estimator 
(i.e. var(B/¥ w) = B/var@y)B) under these constraints. The matrix var(Vy) is the variance matrix 
of the rotation group estimates, which depends on the rotation pattern being used. 


Using standard results for minimisation of a quadratic form under linear constraints (see, for 
example, Rao 1973, p. 65) the optimal B is given by B™ (a) =var(7 wy)" !CwQ-a , where Q7 is any 
generalised inverse of (C{,var y)'Cw). Writing B” =var(i w)'CwQ™ this reduces to 


BY (a) =BY a. 

Thus y” = B™ /¥y is the linear composite estimator based on the window W that is unbiased 

for Yw and has minimum variance. The optimal linear composite estimator of an outcome a! Yy 
is a/p™. 

The dependence of the composite estimators on the window W is important, as different 

windows will give different estimators. Note that all estimates based on the same window will 

have the desirable property that the estimates of w Yw and oY will add to the estimate of 

(0, +02)/Yw. This means, for example, that the estimate of movement between times in the 

window is simply the difference between the composite estimates of level at the time points. 


4.3 Composite estimators and revisions 


In a repeating survey the first composite estimate available for a time point m will be based on a 
window of points W= {t:m—-L<t<m}. It is possible to update previous estimates to be based 
on this same window. This will improve the estimates for those time points, and ensures that 
other estimates such as movement estimates will be optimal. Unfortunately it will result in 


revisions of the survey estimates as new data arrives. 


A sensible approach is to use a fixed size of window L for composite estimation, and to allow a 
fixed number R of revisions. When a new month of data arrives, the window is moved and 
optimal composite estimates are computed for the most recent time point and the previous R 
time points. Estimates for earlier time points are left fixed at their last computed value. 


With a large window and a sufficient number of revisions, the composite estimates from this 
approach will be nearly optimal for any linear combination of the population characteristics. 
They will, for example, be nearly optimal for estimating trend and items such as movement of 
trend. With no revisions the only estimate that is optimal is the end level estimate. Nevertheless, 
a strategy with no revisions is attractive to users, and some evaluation of this option will be 


presented. 


Looking at the size of revisions to the trend is complicated by using composite estimates that are 
themselves revised. Suppose we write y£as the vector of composite estimates available at time 
m, and y™ as the vector of composite estimates available at a later time N when m is in the 
middle of the series. The trend revision on a composite estimate of a/7/,Y becomes 

a! ThyE = a Tiy™ 
The elements of y£and yMare linear combinations of the rotation group estimates 7’,,, and so the 
variance of this trend revision can be calculated based on the variance matrix of these estimates. 
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5 Outcomes for various survey designs 


5.1 Estimates for rotation groups in the Labour Force Survey 


For the calculations in this paper, monthly estimates of persons by labour force status were 
obtained for each rotation group, categorised by month, sex, age (grouped as 15-19, 20-24, ... , 
50-54, 55-64, 65+) and part-of-State (14 geographic regions covering Australia). Within these 
categories, the estimates for each rotation group were pro rated to match known population 
benchmarks. The quantities of interest investigated in this paper are the proportion employed 
and the proportion unemployed, where proportions are of the civilian population aged 15 or 
more. 


The autocorrelation structure of these rotation group estimates has been discussed in previous 
papers — Bell and Carolan (1998) and Bell (1998). The following model for the autocorrelations 
is assumed: 


Corr) rt .V ree) =Pwe correlation between estimates from the same rotation 
group and the same set of dwellings 
= Ppe correlation between estimates from the same rotation 
group but from different sets of dwellings 


This model assumes that the sampling error autocorrelation in a rotation group depends only on 
the lag and on whether the rotation group has a common sample of dwellings between the two 
time points. The values pw, and ppp will decrease as lag Rk increases, with pwr = Ppe. In the 
case of the LFS, the following four-parameter model fits the autocorrelations well on data up to 


lag seven: 
Pwe = (1-17 )(0)°r2 + 0°(1—1r3)) (10) 
Par = (1-77)0 9 (1-1) (11) 


The current rotation pattern does not allow rotation groups to have common dwellings at lags 
over seven months, so the model was used to extrapolate the autocorrelations for these longer 
lags. It appears that the results are not very sensitive to this extrapolation. For discussion of the 
model, including interpretation of the parameters ry, 7p, 8p and 0, , please refer to Bell and 
Carolan (1998) and Bell (1998). The correlations assumed for this paper at various lags are 
shown in table 5.1. They assume the fitted parameter values 0p =0.87697, 0, = 0.94, ry = 0.3101 
and rp =0.90456 for proportion employed and 0p =0.81164, 0; = 0.94, ry =0.50038 and 

rp =0.91713 for proportion unemployed. Standard errors 6g assumed for the simple estimates 
are 0.21 percentage points for proportion employed and 0.11 percentage points for proportion 
unemployed. 


Table 5.1: Estimated autocorrelations of rotation group estimates 


lagk 

Estimate and quantity of interest 1 2 3 4 8 12 18 
Correlation, same dwellings 

Proportion employed 0.80 0.71 0.64 0.57 0.36 0.23 0.12 

Proportion unemployed 0.62 0.52 044 O37 0.19 O11 0.05 
Correlation, different dwellings 

Proportion employed 0.15 0.15 0.14 0.13 0.10 0.08 0.05 

Proportion unemployed 0.11 O11 0.10 009 007 0.06 0.04 
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Using these autocorrelations the variance matrix var(y w) can now be produced for any given 
rotation pattern and window W. 


5.2 Estimators being compared 


To specify a linear composite estimate requires defining a window size Z, and number of 
revisions R and the item for which the estimator is to be optimised. In the tables and graphs 
presented below, the composite estimators are denoted by the notation CZ,R and the simple 
estimator by the letter S. For most comparisons, the composite estimator used will be C11,5 , 
which uses a window of 11 months of data, and allows estimates to be revised five times. 


The composite estimates used are optimised for the quantity proportion unemployed. One 
reason is that the correlations assumed for proportion unemployed may be more typical of other 
variables than are the higher correlations assumed for proportion employed. An estimator 
optimised for proportion unemployed achieves as much as is possible with the lower 
correlations, while still achieving good results for proportion employed. 


The comparisons here are based upon a series of N=90 months, with the middle of the series 
defined to be all but the first 12 and last 12 months (i.e. ™=78 is used). These values are 
sufficient to give results near those of the ideal situation, which would have N and m very large 
with m considerably smaller than N. 


5.3 Results for various rotation patterns and estimates 


Broad comparison 


Table 5.2 presents the standard errors achieved at the end of the series for various outcome 
measures, for four rotation patterns and with simple and composite estimation. The fourth 
rotation pattern is the 4 in 8 out pattern used in the United States Current Population Survey. 


Standard error measures the variability of an estimate due to sampling error — it is calculated as 
the square root of the variance of the estimate. The standard errors are given as a percentage of 
the standard error of a simple estimate of level. 


Table 5.2: Standard errors of simple and composite estimates at the end of the series, 
proportion employed (as % of standard error for simple level estimate) 


Pattern and Original Quarterly average End trend 
estimator Level Movement Level Movement Level Movement 
8in S 100 75 88 80 97 19 

8in C11,5 94 66 82 67 89 16 

lin 2 out S 100 130 66 55 72 15 

lin 2 out C11,5 98 27, 65 53 71 14 
2in2 out S 100 102 76 71 81 17 

2in 2 out C11,5 89 78 72 60 78 15 

4in 8 out S 100 85 84 91 94 22 

4in 8 out C11,5 91 70 77 70 84 17 


The current '8 in' pattern achieves the best standard errors for lag one movement — this is 
expected, since this design has the greatest overlap at lag one. It does not perform particularly 
well for the longer term measures (level and movement of quarterly average and level and 
movement of trend). For all outcomes the composite estimates give lower standard errors than 
the simple estimates. 
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The '1 in 2 out' pattern gives very poor standard errors for lag one movement, but is very good 
for the longer term indicators in this table. Composite estimation achieves relatively little for the 
‘Lin 2 out' rotation pattern. It seems unlikely that composite estimation would be used with this 
rotation pattern, given the extra complexity involved. For this reason only the simple estimator 
will be presented for the '1 in 2 out' pattern in later results. 


The '2 in 2 out' pattern appears as something of a compromise between good long-term 
estimates and good lag one movement estimates. Standard errors under the '2 in 2 out' pattern 
are greatly improved by composite estimation, especially for the lag one movement estimate. 
Composite estimation is required for this rotation pattern to give good standard errors. With 
composite estimation the standard errors compare well with those achieved under other designs 
given here. Only the composite estimator will be presented for the '2 in 2 out’ pattern in later 
results. 


Finally, the '4 in 4 out' estimator is shown as an example of what is achieved under other rotation 
patterns. This rotation pattern is much improved by composite estimation, and in fact the 
Current Population Survey of the Bureau of Labour Statistics uses a composite estimator (though 
not of the form described in this paper). 


Comparison to results from simple estimator for current pattern 


The remaining comparisons will be between four designs, each consisting of the choice of a 
rotation pattern and an estimation strategy. 


8ins current rotation pattern, simple estimator 

8 in C current rotation pattern, composite estimator 
2 in 2 out C 2 in 2 out with composite estimator 

lin 2 out S 1 in 2 out with simple estimator 


Graph 5.1 presents a bar chart giving standard errors for the same outcomes as table 5.2, but 
expressed as a percentage of the standard error achieved under the '8 in S' design. Graph 5.2 is 
the same but for proportion unemployed. 


Graph 5.1: Standard error, proportion employed (relative to '8 in S') 
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Graph 5.2: Standard error, proportion unemployed (relative to '8 in S') 
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Graph 5.3 presents the standard errors for movement of proportion employed at various lags. 
'2 in 2 out C' performs well for movement at lag three or more, and is not too bad for lag one or 
two. '1 in 2 out S' is good at some specific lags, but very poor at lags one and two. 


Graph 5.3: Standard error (movements), proportion employed (relative to '8 in S') 
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Graph 5.4 presents outcomes related to the end trend. Standard errors are given for the trend 
level, the trend movement at lag one and lag three, the trend curvature, the revision of the trend 
level and the revision of lag one trend movement. '1 in 2 out S' has the lowest standard errors for 
most of these measure, but has the highest standard error for trend curvature. The curvature of 
the trend at the end apparently is affected by the poor behaviour of the lag one movement under 
this design. '2 in 2 out C' performs consistently well for these trend outcomes. 


Graph 5.4: Standard error for trend outcomes, proportion employed (relative to '8 in S’) 
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Comparison between composite estimators 


The composite estimators presented above used 11 months of data and assumed five revisions. 
It may be desirable to use a smaller window, and to use fewer or no revisions. The drawback of 
this is that with a small window or few revisions the estimators will be less optimal, particularly 
for the longer term outcomes. 


Comparisons of five different estimators are given in graph 5.5 for the '8 in’ rotation pattern and 
in Graph 5.6 for the '2 in 2 out' pattern. 
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Graph 5.5: Standard error, proportion employed (relative to '8 in S') 
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Graph 5.6: Standard error, proportion employed (relative to '8 in S') 


% of 8 in S 

ca ma 2 in 2outS 
mm 2 in 2 out C5,0 
mm 2 in 2 out C11,0 
©) 2 in 2 out C5,3 
— 

100 — 2 in 2 out C11,5 

50 4 ll it a WW 
0 
Quarterly Quarterly 
Original Original average average Trend Trend 
level movement level movement level movement 


The general picture is similar for both patterns, with standard errors improving as window size 
and number of revisions increase. Revisions are particularly important to achieve the best lag 
one movement estimates. Longer windows always reduce the standard errors, but not by very 
much in the case of no revisions. With revisions, the longer window has the greatest effect for 
long term indicators, particularly movement of quarterly average, and movement of trend. 
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5.4 Simulating a series of survey errors 


It is useful to get a feel for the effect of the different designs on the series of survey estimates. To 
do this, sampling error was simulated by drawing from the multivariate normal distribution with 
mean 0 and variance matrix var(V w). To aid comparing across designs, the same random 
numbers were used to simulate from each design. This effectively produces a simulated set of 
rotation group estimates for each rotation pattern under the assumption that the true population 
values were 0 for all time points. These can then be used to produce a series of simple and 
composite estimates. 


Graphs 5.7 and 5.8 show simulated series of sampling errors for the '8 in S' and '1 in 2 out S' 
designs respectively. Superimposed is the mid trend applied to these sampling errors, based on 
90 points (including data from 12 months beyond the last point shown). Under our model 
sampling error is added to the true series, and the sampling error on the trend is added to the 
trend. Thus the graphs effectively show a single simulation of the effect of sampling error on the 
observed original and trend series. The sampling errors are shown relative to the standard error 
of a single level estimate. 


Graph 5.7: Simulated effect of sampling error, 8 in S design, proportion employed 
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Graph 5.8: Simulated effect of sampling error, 1 in 2 out S design, proportion employed 
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The comparison between '8 in S'and'1 in 2 out S' is quite instructive. The '8 in S' design gives 
superficially quite well behaved estimates, with small movements between successive points. 
The problem is that there is a clear longer term movement in the underlying series, induced by 
the correlation between successive estimates. This apparent trend is spurious, since effectively 
the true population values are all zero for this simulation. 


The '1 in 2 out S' design, in contrast, has large movements at lag one, and shows an obvious 
autocorrelation at lag three. It is harder to discern any great movement in the trend effect over 
the period shown — this is good if our objective is to minimise the effect of sampling error on 
trend. 


The simulations shown are typical of a number of simulations that were run for these 

two designs. Simulations for the '8 in C' design are similar to the '8 in S' design but with slightly 
reduced variability. The '2 in 2 out C' design displays behaviour between the extremes 
represented by the '8 in S' design and the '1 in 2 out S' design. The simulations are relevant 
because if the 'l in 2 out S' design was adopted, users would be faced with data that looks very 
different to the current series, and considerably more volatile. For the current estimates quite a 
lot of sampling error is passing into the trend series. This would be reduced under the '1 in 2 out 
S' design, where more of the sampling error passes into the irregular of the series. Thus the 
improved trend series in the '1 in 2 out S' design is achieved at the cost of increased irregularity 
of the original estimates. 
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6 Conclusions 


The survey designer is faced with the task of providing estimates that are as useful as possible for 
the purposes the survey aims to achieve. For a repeated survey, most users are interested in 
monitoring change over time. This does not necessarily mean that the key interest of users in a 
monthly survey should be month-to-month movements. This paper suggests that survey designs 
should in many cases be aimed at achieving good estimates of longer term change. 


The paper suggested a number of outcomes that are of interest to users and that could be 
assessed in designing a repeating survey. It introduced the X11 trend as a surrogate for the sorts 
of analysis that users do to determine the longer term behaviour of a series, and specified 
outcomes related to the trend. It also examined the effects on the various outcomes of changing 
two aspects of the survey design — the rotation pattern and the estimator. 


In the LFS example there were three main alternatives to the current rotation pattern and 
estimator. The first was to add composite estimation — this improves standard errors across all 
outcomes. The second was to move to the '2 in 2 out' pattern with composite estimation — this 
improved the longer term outcomes further, but was not quite as good for lag one movement. 
The third alternative was to move to a'1 in 2 out' pattern with simple estimation — this could 
achieve further improvements to the standard error of most longer term outcomes, but was very 
poor for lag one movement. This is very noticeable in the simulated series of sampling errors, 
where the '1 in 2 out' series appears quite irregular. 


There is no magic answer that is best for every possible use of the data. Any design is a trade off 
— between monthly movement and longer term outcomes, between complexity and simplicity, 
between cost and accuracy. Designers of repeated surveys should keep in mind the uses made 
of the data and allow that to influence design choices. 
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